Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

David/cmake config cuda capability #4481

Conversation

davidlin409
Copy link
Contributor

@davidlin409 davidlin409 commented Mar 18, 2021

When I am trying to train nnet1 train.sh script, I got error as shown below:

2021-03-18_11-12-35.log

It clearly indicates that CUDA capability does not match my GPU, which is "GeForce GT 710".

Without considering the fact that I am using nnet1 (where I use old Kaldi commit to do testing, where nnet1 is working properly), CMake project does not allow user to set CUDA capability (or maybe I did not know?).

In order to solve that, I propose to add additional CACHED variable to CMake project called "CUDA_CAPABILITY" (If the name needs modification, it is all OK), and use it to configure CUDA compilation. This fix is based on configure script under src/ folder. Originally, configure script will add a bunch of gencode to CUDA compiler. Here, I limit it to one gencode such that CUDA image that is not used will not be generated.

Actually I have also port configure logic to CMake project. But that will cause lots of changes. So I use this method to see if the change is acceptable. Further modification can be discussed.

Thanks.

PS: Don't know why so many changes are shown, actually this pull request only change two lines, and previous commit is to sync the CMake patch into old Kaldi commit, such that merge conflict does not occur.

@danpovey
Copy link
Contributor

danpovey commented Mar 18, 2021 via email

CMakeLists.txt Outdated
@@ -12,11 +15,18 @@ if(NOT PYTHON_EXECUTABLE)
endif()

message(STATUS "Running gen_cmake_skeleton.py")
option(BUILD_SHARED_LIBS "Build shared Kaldi libraries." OFF)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your previous PR got intermixed with this one. Perhaps you should merge Kaldi master into your branch (depends on your Git-Fu level), but a simple merge will take care of this.

@kkm000
Copy link
Contributor

kkm000 commented Mar 27, 2021

I think that for CUDA compilation we should use CMake 3.19 or above. In 3.18, the support for CUDA as a first-class language was added (e.g., project(Kaldi LANGUAGES CXX CUDA), or adding later if CUDA found with enable_language(CUDA). In this PR, if I am subtracting the previous PR correctly, you're adding just two cached variables—is that correct?

I do not think that simply setting the single compute capability is adequate. For example, when I'm building Kaldi for a Google Cloud cluster, I want to have options to use 3 different GPU types (the sensible ones for use with Kaldi among those they offer are Tesla P100, V100 and T4), so I compile for capabilities 6.0, 7.0 and 7.5. Just a single capability won't cut it. And (remember the invariant?) let's leave the default as-is. It's much easier for the advanced used to narrow the choice than to a novice to try to figure out the architectures they need. Assume all they know that they need to install CUDA to use it with Kaldi, and it will just work.

CMake can be installed as easy as we install other software, even if the user is not allowed to sudo. It is distributed as a fully static build, and can be unpacked just anywhere. For example, to have it installed into the tools/ subdirectory, as we do with the other stuff, the following script will work (but verify on Mac, because one or both of the switches --strip-components and/or --exclude may be GNU extensions. Mind -Linux- in the name; it comes as a .dmg for macOS):

# Assuming cwd is in kaldi/tools
cmake_ver=3.19.4
curl -sSL https://github.com/Kitware/CMake/releases/download/v$ver/cmake-${cmake_ver}-Linux-x86_64.tar.gz |
  sudo tar xvz --strip-components=1 --exclude=doc{,/\*} -C ${cmake_ver}
ln -sf ${cmake_ver} cmake

I'd prefer to have an option, as we currently do, named consistently with the Kaldi_ prefix, like

option(Kaldi_WITH_CUDA "Build Kaldi with CUDA support" ON)
. . . .
if(${Kaldi_WITH_CUDA})
  if($(CUDA_FOUND)
    enable_language(CUDA)
  else()
     # Look what our configure says, should instruct the user what to do, like install CUDA and
     # how, or set `-DKaldi_WITH_CUDA=OFF` or ignore this warning if they do not need CUDA.
    message(WARNINIG ...)
  endif()
endif()

I didn't have enough time to play with the new CUDA support; would appreciate it if you do.


Always start a new branch in Git for any separate change. Your old commit shows blended with the new changes because you attempt to reuse branches. Git branches are very lightweight. Personally, I prefer to have remotes called golden for this repo and origin for my clone, to protect from unexpected push to the main repo (I do that for literally every repo I prepare PRs for, even if I have no write access to the "golden" repo, so it's consistent all across my work). Also, I prefer not to have local branch called master, unless it's very my own repo which I push into without going through code reviews. Basically, your workflow becomes like this

# I have the current branch name and state in my prompt: 
# PS1='.... $(__git_ps1 "[%s]") ....' and
$ env | sort | grep ^GIT
GIT_PS1_SHOWDIRTYSTATE=true
GIT_PS1_SHOWSTASHSTATE=true
# in my environment. 

kkm@buba:~/work/kaldi[2103-4404-alignment]$ git fetch --all
Fetching golden
. . .
From git://github.com/kaldi-asr/kaldi
   d3c2be494..0970c29ce  master     -> golden/master
Fetching origin

# NOTE: Explicit golden/master in the next command:
kkm@buba:~/work/kaldi[2103-4404-alignment]$ git checkout golden/master -b my-new-work
Switched to a new branch 'my-new-work'
kkm@buba:~/work/kaldi[my-new-work]$
# ... hack-hack-hack ...
# At any time you can commit your changes and switch to your other work
# Read about the --amend switch to avoid creating unnecessary/incomplete commits
kkm@buba:~/work/kaldi[my-new-work]$ git co 2103-4404-alignment
Switched to branch '2103-4404-alignment'

# And you can even keep your own changes on top of golden/master
kkm@buba:~/work/kaldi[2103-4404-alignment]$ git rebase golden/master
Successfully rebased and updated refs/heads/2103-4404-alignment.
# Not always successful, you may need to resolve conflicts. But you'll need to deal
# with the conflict in this case anyway when sending the PR. This option helps a lot:
$ git config --global rerere.enabled true
# HINT: The smaller is the change, the less likely is the conflict.

# Do not clutter local repo with branches you do not need. Read on git branch -d vs -D.
# When we commit into main repo, we use the GitHub "squash" option to put your changes on
# top of Kaldi master. The side effect is that deleting your accepted branch from local
# repo requires -D. It's normal.
kkm@buba:~/work/kaldi[2103-4404-alignment]$ git branch -d my-new-work
Deleted branch my-new-work (was 2171c2aba).

@kkm000
Copy link
Contributor

kkm000 commented Mar 28, 2021

Oh by the way.

LOG ([5.5]:FinalizeActiveGpu():cudamatrix/cu-device.cc:308) The active GPU is [0]: GeForce GT 710 free:1489M, used:511M, total:2000M, free/total:0.744471 version 3.5

You do have 3.5. Something went wrong during build.

@davidlin409
Copy link
Contributor Author

Hi @kkm000:

Previously when I am building using CUDA default (I remember it to be capability 50?), I got capability error. So the build is fine, but run-time will have error.

For the weird branching strategy, it is because that I am testing CUDA coding with NNET1. In NNET1, somehow training have error which cause me not able to verify if the build is successful or not. One possible way is that I start from old commit, make some changes, and then merge with current master. Or if I can run some nnet2 above example, that can also work. I will see how to proceed.

For CUDA 3.19 feature, maybe I can check it and play around with it. But for this, I will need to setup my personal laptop such that I can run CUDA under Ubuntu WSL (For some reason, I don't want to install Visual Studio on my laptop). It is not appropriate to try it on my company computer.

For multiple CUDA capability support, actually I used to patch CMake project to migrate entire Makefile logic into it. Which means, I can, based on CUDA driver version, include all CUDA capability into CUDA build setting.

How about the idea of migrating entire Makefile logic into CMake project, while ensure that CUDA capability variable is CACHED such that people can tweak it later on?

For the logic above you mention, I can integrate the logic with Makefile logic, but the modification will be larger. Previously I am afraid that the change is too large, so that I made this small change to see how it goes.

@davidlin409 davidlin409 force-pushed the david/cmake_config_cuda_capability branch from ea33e7b to 99de0de Compare March 30, 2021 02:47
@kkm000
Copy link
Contributor

kkm000 commented Apr 2, 2021

For CUDA 3.19 feature, maybe I can check it and play around with it

You mean CMake 3.19, right? You can perfectly install it on WSL. I do not think Visual Studio will be of much help. You can use VSCode on WSL with CMake, however, they integrate very well after some headbanging and cursing... I mean, there are some conventions that need to be observed. You'll need an X Server on your laptop; I'm using VcXsrv, but Cygwin also works. What distro are you using?

The stuff looks too complicated, as if you were trying to work around CMake than using it. I'm on your change. In fact, I'm working on something similar today, so we'll figure out how to do that the best way.

@kkm000
Copy link
Contributor

kkm000 commented Apr 21, 2021

@davidlin409, here we go. CMake support for CUDA is extremely easy.

First, let's assume that the user has a pretty latest version of CMake. 3.19+ to be exact. If they do not, download and untar into ${HOME}/.local takes about 2 minutes. One and the same binary tarball works for all glibc6-based Linuces, and for macOS there is a separate package. (I once held an open Mac notebook in my hands while another guy typed on the keyboard to show me something, but that's about all experience I have with Macs. I'd say, they are pretty lightweight and sleek.)

I absolutely do not want build scripts peppered with checks "if CMake version is less than x.y, do this workaround".

Taking this as a precondition, here's all that required to compile Kaldi with CMake and CUDA.

  1. Declare the project without CUDA language support (project Kaldi CXX ...) but not yet CUDA.
  2. find_package(CUDA) is obsolete, and find_package(CUDAToolkit) is a replacement. The old package doesn't follow the conventions of what is called the Modern CMake, whose central tenet is "think about targets and dependencies, not directories and files and variables that name them." The old one defines a lot of unrelated variables on output; the replacement defines IMPORTED targets, and CUDA version variables that will indeed come handy.
  3. If everything else fails, absolutely everything, and there is no other solution than the use of a regex, then use a regex. But for version values, CMake offers quite nice version comparisons operators.
  4. Thou shalt not cache the CMAKE_CUDA_ARCHITECTURES variable. Ever. Just no. By way of how it's used, it's very hard to get rid of, short of deleting the cache (which is not a biggie, but still unclean).
  5. But if we do not set it, then it will be assigned a singe arch 35 by enable_language(CUDA), which is not the best default. Besides, the user should have the full control over the arch he wants to build.

All of the above considered, here's a pretty much working prologue to enable CUDA:

option(Kaldi_WITH_CUDA "Build Kaldi with CUDA" ON)
set(Kaldi_CUDA_ARCH "" STRING  "CUDA architectures to build. See CUDA_ARCHITECTURES property" CACHE)

project(Kaldi LANGUAGES CXX VERSION 5.5)
set(CMAKE_CXX_STANDARD 14)
set(CMAKE_CXX_EXTENSIONS OFF)

# . . . right here or later, but before calling enable_testing() . . .
if(Kaldi_WITH_CUDA)
  # So we have CUDAToolkit_Version set:
  find_package(CUDAToolkit REQUIRED)
  # Unless the user specified other (non-empty) list of CUDA archs:
  if(NOT Kaldi_CUDA_ARCH)
    # Apply your/our logic that you already have implemented to select default values for platforms.
    # list(APPEND ....) comes handy here, since Kaldi_CUDA_ARCH *is* a list. You can grow it based
    # on checks using list(APPEND), or set unconditionally with 'set(Kaldi_CUDA_ARCH 60 61)'.
    # And did I happen to mention the version comparison operators?
    if(architecture tests and cuda version tests>)
      (list Kaldi_CUDA_ARCH APPEND 60 61 70)
    # more elseifs
    endif()
  endif()

  set(CMAKE_CUDA_ARCHITECTURES ${Kaldi_CUDA_ARCH})
  set(CMAKE_CUDA_STANDARD ${CMAKE_CXX_STANDARD})
  set(CMAKE_CUDA_HOST_COMPILER "${CMAKE_CXX_COMPILER}")
  string(APPEND CMAKE_CUDA_FLAGS_INIT " -default-stream per-thread -lineinfo")

  enable_language(CUDA)
endif()

This is basically it. .cu files are added together with .cc files, as CUDA now is a first class language. When compiling and/or linking, add a dependency on the IMPORTED targets defined by the FindCUDAToolkit module. The library link dependency should be PUBLIC, because targets that depend on it also depend on CUDA libraries and, most likely, includes. Extending the target like cudamatrix is as simple as

# Assuming cudamatrix has been added as a "normal" C++ library.
if(Kaldi_WITH_CUDA)
  target_compile_definitions(cudamatrix PUBLIC "HAVE_CUDA"
  target_sources(cudamatrix cu-kernels.cu cu-kernel.h)
  target_link_libraries(cudamatrix PUBLIC
    CUDA::cudart_static
    CUDA::cuda_driver
    CUDA::cublas
    CUDA::cublasLt
    CUDA::curand
    CUDA::cufft
    CUDA::cusolver
    CUDA::cusparse
    CUDA::nvToolsExt)
endif()

If you compile a .so, the linker will immediately tell you which CUDA::xxxx references to add in this directory :)

Adding .h files is a good practice to make them visible in an IDE, such as VSCode. CMake does not rely on them being dependencies, it infers dependencies automatically.

@kkm000
Copy link
Contributor

kkm000 commented May 2, 2021

@davidlin409, please let me know if you're interested in developing a full-featured CMake build for Kaldi. No bad feelings if you decline, of course: we all have only so much time to give to FOSS.

I'll be able to advise you, but currently I'm out of idle cycles to actively participate.

This PR would be good in the CMake 2.x times. There are new patterns and paradigms, called "Modern CMake," supported starting at v3.8+ or so, and "More modern CMake," using features of v3.15. CUDA has been added as a full programming language, on equal footing with C and CXX and whatsnot in v3.18, and fully stabilized by v3.19.

find_package(CUDA) has been long declared obsolete; the replacement is find_package(CUDAToolkit), which creates targets for CUDA libraries, not variables for library full pathnames. I cannot accept this patch as is, since CMake grows (and it's indeed a dynamic project, with a lot of features appearing in every minor version), and the obsolete code is more and more likely to break or be removed. Everything we accept, we support, and this is not the kind of code I'd like to be fixing when it breaks. :)

@kkm000 kkm000 added enhancement waiting-for-feedback Reporter's feedback has been requested labels May 2, 2021
@kkm000
Copy link
Contributor

kkm000 commented May 2, 2021

BTW, for CMake build on WSL, I personally find VSCode is much less clunky than Visual Studio: https://code.visualstudio.com/docs/cpp/config-wsl. It is said that WSL2 support is experimental, but WSL2 has a full hypervisor guest capabilities, so you may install VSCode into a WSL2 VM.

In fact, my biggest impetus for adding CMake build is the IDE support, which VSCode provides nicely enough, especially with tools like clang-tidy and clang-format. I experimented with VS2019, but not much, just do not know how it compares if properly configured.

@kkm000
Copy link
Contributor

kkm000 commented May 8, 2021

Friendly ping. Is this ball back in my backyard?

@davidlin409 davidlin409 closed this May 9, 2021
@davidlin409 davidlin409 deleted the david/cmake_config_cuda_capability branch May 10, 2021 02:10
@kkm000 kkm000 removed the waiting-for-feedback Reporter's feedback has been requested label May 14, 2021
@kkm000
Copy link
Contributor

kkm000 commented May 14, 2021

@davidlin409, thanks for your effort, and you are welcome back any time if you reconsider. I believe that building a complex project like Kaldi it's a great opportunity to learn the Modern Cmake.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants